Kernel Machine Based Feature Extraction Algorithms for Regression Problems
نویسندگان
چکیده
In this paper we consider two novel kernel machine based feature extraction algorithms in a regression settings. The first method is derived based on the principles underlying the recently introduced Maximum Margin Discimination Analysis (MMDA) algorithm. However, here it is shown that the orthogonalization principle employed by the original MMDA algorithm can be motivated using the well-known ambiguity decomposition, thus providing a firm ground for the good performance of the algorithm. The second algorithm combines kernel machines with average derivative estimation and is derived from the assumption that the true regressor function depends only on a subspace of the original input space. The proposed algorithms are evaluated in preliminary experiments conducted with artificial and real datasets. 1 FEATURE EXTRACTION BASED ON AMBIGUITY DECOMPOSITION In this article we consider regression problems, where the data (Xi, Yi) are independent, identically distributed random variables, L is loss function such as e.g. quadratic loss function L(y, z) = (y − z), and we seek to determine the regressor f(x) = argminy E[L(Y, y)|X = x]. Let us first consider the model Y = ∑ i βigi(X) + 2, where gi : X → R are unknown functions, and 2 is noise variable, independent of Y, X . We shall consider estimating gi by means of an iterative procedure. One view of the model is then to treat the Y = β γ + 2 as a linear regression problem, where γ = (g1, . . . , gm). 1.1 Ambiguity decomposition In this section we shall assume that the vector β is such that 0 ≤ β ≤ 1, β e = 1, where e = (1, 1, . . . , 1) , i.e., the output can be obtained as a noisy convex combination of the ‘features’ g1(X), . . . , gm(X). We shall further assume that the loss function is the quadratic loss. Let g = ∑ i βigi, f arbitrary. Then, it is not hard to see that Loss(g) = ∑ i βi Loss(gi) − ∑ i βiE[(gi(X) − g(X))] and Loss(g) = E[(g(X) − f(X))]. This formula, first given in [2] is called “ambiguity decomposition” (AD). The ensemble loss can be decreased if the ambiguity of the ensemble is maximized whilst keeping the loss of the individual members low. 1 Computer and Automation Research Institute of the Hungarian Academy of Sciences, Budapest, Hungary email: [email protected] 2 Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University of Szeged, Szeged, Hungary email: {kocsor,kkornel}@inf.u-szeged.hu Now, we obtain easily ∑ i βiE[(gi(X)− g(X))] = ∑ i (β i − βi) ( E[gi(X)] 2
منابع مشابه
Comparative Analysis of Machine Learning Algorithms with Optimization Purposes
The field of optimization and machine learning are increasingly interplayed and optimization in different problems leads to the use of machine learning approaches. Machine learning algorithms work in reasonable computational time for specific classes of problems and have important role in extracting knowledge from large amount of data. In this paper, a methodology has been employed to opt...
متن کاملAutomatic Feature Selection via Weighted Kernels and Regularization
Selecting important features in non-linear kernel spaces is a difficult challenge in both classification and regression problems. We propose to achieve feature selection by optimizing a simple criterion: a feature-regularized loss function. Features within the kernel are weighted, and a lasso penalty is placed on these weights to encourage sparsity. We minimize this feature-regularized loss fun...
متن کاملKNIFE: Kernel Iterative Feature Extraction
Selecting important features in non-linear or kernel spaces is a difficult challenge in both classification and regression problems. When many of the features are irrelevant, kernel methods such as the support vector machine and kernel ridge regression can sometimes perform poorly. We propose weighting the features within a kernel with a sparse set of weights that are estimated in conjunction w...
متن کاملKernel-based Fuzzy Feature Extraction Method and Its Application to Face Image Classification
The Hughes phenomenon (or the curse of dimensionality) shows two essential directions for improving the classification performance on high-dimensional and small sample size (SSS) problems. One is to reduce the dimensionality of applied data by feature extraction or feature selection methods. The other is to increase the training sample size. In recent years some kernel-based feature extraction ...
متن کاملDevelopment of a Pharmacogenomics Model based on Support Vector Regression with Optimal Features Selection Approach to Determine the Initial Therapeutic Dose of Warfarin Anticoagulant Drug
Introduction: Using artificial intelligence tools in pharmacogenomics is one of the latest bioinformatics research fields. One of the most important drugs that determining its initial therapeutic dose is difficult is the anticoagulant warfarin. Warfarin is an oral anticoagulant that, due to its narrow therapeutic window and complex interrelationships of individual factors, the selection of its ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004